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ABSTRACT 


- In  the  study  of  the  foundations  of  the  subjectivist  theory  of  statistics 

we  find  that  each  aspect  of  the  theory  corresponds  to  a  different  application 
of  a  single  manipulation,  name ly  the  adjustment  of  one  belief  structure  by 
another  belief  structure.  This  article  describes  the  technical  machinery  of 
this  manipulation  to  a  sufficient  level  of  detail  to  cover  all  of  the 
applications  of  the  adjusted  belief  structure  in  the  foundations  of  the 
theory.  We  also  discuss  the  relationship  between  the  adjustment  of  belief 
structures  and  the  conditioning  of  random  variables.  (Essentially  the  latter 
is  a  simple  special  case  of  the  former.). 


AMS  (MOS)  Subject  Classifications:  Primary  62A15;  Secondary  60A99 
Key  Words:  Coherence,  conditional  prevision,  prevision,  projection  . 
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ADJUSTED  BELIEF  STRUCTURES 


Michael  Goldatain* 

1.  Introduction 

Probability  theory  hae  an  extremely  wide  range  of  application.  However,  in  Boat 
applicationa,  the  baaic  manipulationa  are  eaaentially  the  same ,  the  differencee  arieing 
fron  the  context  within  which  theae  manipulation*  are  expreaeed.  Thua,  identifying  and 
under atan ding  theae  baeic  aanipulationa  ia  eaaential  for  aeparating  out  the  underlying 
conceptual  iaauea  fro*  the  apecific  technical  difficultiea  of  a  particular  application. 

Thia  ia  particularly  important  in  the  etudy  of  the  foundationa,  aa  our  aim  ia  to 
deacribe  the  poeeibilitiea  of  the  theory.  Thua,  we  shall  now  identify,  and  diacuaa  in 
detail,  the  eingle  manipulation  of  the  belief  atructure  which  we  repeatedly  require,  namely 
the  adjustamnt  of  one  belief  atructure  by  another  belief  atructure.  Juat  aa  we  find  that 
our  study  of  the  foundationa  ia  naturally  expreaaed  in  terma  of  belief  etructurea,  ao  we 
will  find  in  aubaequent  article*  that  each  aapect  of  the  theory  will  aimply  correepond  to  a 
different  adjustment  of  the  belief  etructure. 

Thia  article  ia  a  aequel  to  the  previous  technical  report  entitled  "Belief 
Structure*",  and  ia  the  aecond  of  a  aeriea  of  article*  laying  the  foundation  for  the  sub- 
jectiviat  theory.  Our  intention  in  thia  article  ia  aimply  to  explain  the  general  proceaa 
of  adjustment  in  aufficient  detail  to  cover  all  of  the  various  applications  of  this  process 
that  we  make  in  aubaequent  articles.  The  notation  is  as  in  the  previoua  report. 

In  order  to  motivate  the  construction,  we  will  begin  by  discuaaing  the  simplest 
example  of  such  an  adjustment. 
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2.  g^ggtCM  and  alternative  Inner  product* 

In  our  investigations ,  we  choose  aa  fundamental  tha  innar  product  apaca  A  defined  by 
the  inner  product  (X,Y)  -  P(XY)  over  the  linear  space  L.  DeFinetti  (Theory  of 
Probability,  1974,  section  4.17)  discusses  this  space  aa  a  simple  geometric  interpretation 
of  the  previeion  of  products  of  random  variables.  Re  then  observes  that  there  is  an 
alternative  inner  product  which  is  more  coaaaonly  used,  and  say  appear  sore  natural.  This 
inner  product  (*,*)'  is  defined  over  L  by 

(X,Y>*  -  P(X-F(X))(Y-P(Y))  -  cov(X,Y)  . 

Let  us  call  the  inner  product  space  generated  by  this  inner  product  A'.  (Note  that 
we  must  identify,  as  equivalence  classes,  all  randoai  quantities  which  differ  by  a  constant, 
so  that,  for  example,  xQ  (the  unit  constant)  is  equivalent  to  the  sero  vector  in  A'.) 

Thus,  in  A',  1X1  is  the  standard  deviation  of  X.  Very  loosely,  we  can  consider 
that  vectors  with  large  noras  in  A*  correspond  to  randoai  quantities  whose  values  you  are 
“vary  uncertain  of",  while  vectors  with  eawll  norm  are  those  whose  value  you  are  "fairly 
sure  of".  The  inner  product  is  covariance,  and  orthogonality  corresponds  to  sero 
correlation. 

Thus,  if  we  wish  to  consider  "relationships"  between  random  quantities,  and  to  express 
our  "degree  of  uncertainty”  about  these  quantities,  then  in  eany  ways  the  space  A*  seens 
a  sore  natural  object  then  the  epace  A.  He  will  show  thet,  in  a  certain  sense.  A*  is 
not  a  "different"  inner  product  space  to  A,  but  that  A*  can  nore  usefully  be  considered 
as  an  "adjustment"  of  A.  This  is  a  typical  exanple  of  the  purpose  of  adjustment,  namely 
to  remove  certain  featurea  of  the  space  A  which  sre  not  of  immediate  interest  (such  as 
the  individual  previsions  of  the  elements  of  C),  in  order  to  focus  attention  on  aspects 
which  are  of  interest  (such  as  the  "uncertainties”,  or  variances,  of  the  elements  of  C). 

As  a  first  step  in  describing  our  construction,  notice  that  rather  than  defining  A 
and  A'  in  terme  of  different  inner  products,  we  can  instead  view  A’  as  a  aubspace 
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by  identifying  each  (equivalence  claae)  X  with  the  corresponding  random  quantity 
"corrected  for  the  mean",  i.e.  X  -  p(X)XQ.  (Mote  that  as  members  of  an  equivalence  claae 
differ  by  a  constant,  each  member  of  a  particular  class  is  identified  with  the  same  random 
quantity.)  The  identification  T(X)  -  x-P(x)xQ  preserves  the  inner  product,  as  for  any 

x,  y  e  A, 
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(X,Y) 1  -  (T(X) ,T(Y) )  . 


Thus,  we  will  never  require  the  alternative  inner  product  structure  A*  as  whenever 
we  want  only  to  consider  variation  about  the  mean,  then  we  can  focus  attention  on  the 
subspace  A*  of  A.  Notice  in  particular  that  A  is  the  orthogonal  sum  of  A*  and  AQ 
(i.e.  A  ■  An  •  A  ). 
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3.  Adjusted  belief  structures 

The  construction  of  section  2  is  useful  in  its  own  right,  but  it  is  also  the  simplest 
example  of  a  very  general  construction.  In  this  example,  we  began  with  a  belief  structure 
constructed  around  a  list  C  “  {x ,,..., XB).  He  then  introduced  a  new  quantity  X„  into 
the  structure  and  used  this  new  quantity  to  split  the  belief  structure  into  orthogonal 
subspaces.  The  general  construction,  of  which  this  is  a  special  case  is  as  follows. 

(i)  Me  begin  with  a  belief  structure  A,  constructed  from  a  collection  C  - 

tx0,X, ,Xj , . . . . 

(ii)  We  introduce  a  further  collection  of  random  quantities  C'  "  (Y ^ , . . . ,  Yk)  (where  some 
elements  of  C  and  C'  may  be  the  same). 

(iii)  We  construct  the  belief  structure  8  from  the  collection  C*  (i.e.  we  evaluate 
(YifYj)  -  for  each  i,  J ). 

(iv)  We  now  add  the  belief  structures  A  and  8,  to  give  a  new  belief  structure 
V  -  A  +  8,  spanned  by  the  elements  {X^ , . . . ,XB ,Y^ , . . . , Y^)  (i.e.  we  evaluate  each 

p(X1Yj)). 

(v)  We  now  divide  the  space  0  into  two  orthogonal  subspaces  8  and  B'*',  where  B'*'  is 
the  orthogonal  complement  of  8  in  V  (i.e.  so  that  V  m  8  •  8^). 
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In  the  construction  of  section  2,  the  collection  C*  was  the  single  quantity  XQ . 

The  belief  structure  S  was  the  space  we  previously  termed  A0«  When  we  constructed  the 
space  A  by  adding  Aq  to  the  space  spanned  by  we  evaluated  (Xg,X^)  ” 

P(Xi)  for  each  i.  we  then  divided  A  into  orthogonal  subspaces  Aq  and  Aq,  where  Ag 
is  the  space  we  previously  tensed  A*. 

In  this  case,  introducing  the  constant  Xq  as  a  subspace  has  separated  out  your 
beliefs  so  that  you  may,  if  you  wish,  consider  variances  and  covariances  separately  from 
mean  values.  The  general  construction  will  be  useful  whenever  the  new  spaces  B  and  8^ 
that  are  created  have  a  natural  subjective  interpretation.  The  full  importance  of  this 
construction  will  be  revealed  when  we  consider,  in  subsequent  articles,  the  revision  of 
your  beliefs.  We  will  show  that,  in  a  certain  important  sense,  the  separation  between 
subspaces  is  preserved  under  the  revision  of  ths  innsr  product  over  the  belief  structure, 
for  a  wide  class  of  choices  of  B>  Further,  we  will  identify  a  particular  choice  of 
space  B  for  which  the  above  construction  will  essentially  define  the  properties  of  your 
revisions  of  belief. 

However,  for  the  present,  let  us  simply  note  that  this  is  an  interesting  construction, 
which  we  are  likely  to  use  fairly  often.  Thus,  we  introduce  a  helpful  piece  of  notation  to 
describe  the  construction. 

Definition.  If  A  and  B  are  both  belief  structures,  then  the  belief  structure  A 
adjusted  for  the  belief  structure  S,  written  A/S,  is  defined  to  be  the  orthogonal 
co^>lement  8^  of  the  subspace  B  in  the  space  A  +  8. 

Thus,  for  example.  A*  ”  A/Aq,  i.e.  A*  is  the  space  A  adjusted  for  the  constant 
space.  Notice  that  the  definition  is  the  same  whether  the  elements  of  A  and  B  are 
partially  or  completely  distinct.  The  notation  identifies  the  orthogonal  complement  of  a 
space  with  the  associated  quotient  space.  It  is  suggestive  of  a  connection  between  the 
operations  of  adjusting  a  space  and  the  "conditioning"  of  random  quantities.  This 
connection  will  be  explored  after  we  have  briefly  outlined  some  of  the  properties  of 
adjusted  spaces. 


tructure 


Adjusted  belief  structures  obey  e  few  simple  rules  which  we  list  here  for 
convenience.  (The  proofs  ere  straightforward. )  For  any  belief  structures  Af,...,A)c,  8, 

(i)  A j/B  “  0,  the  aero  space,  if  and  only  if  A^  c_  B. 

(ii)  Ay  8-  At  if  and  only  if  Aj  i  8 

(iii)  (A*  +  Aj)/8  -  (AyB)  +  (Ay 8) 

(iv)  A1  +  Aj  +*i.+  *  A1  e  ( A2/A^ )  e  (A2/(A^  +■  A2))  •*.»•  (Aj^/A ^  +. ••+  A^ y) 

(v)  (AyB)  i  (A2/8)  if  and  only  if  A,  c  8  +  Pv  A2  c  8  +  P2,  where  8,  P1#  t>2  are 

Mutually  orthogonal. 

(Property  (iv)  is  useful  when  we  wish  to  systematically  adjust  each  of  a  collection  of 
belief  structures.  Property  (v)  is  the  key  to  the  general  representation  theorems  that  we 
shall  develop  in  later  articles.) 

The  general  properties  of  adjusted  structures  are,  however,  more  conveniently 
expressed  by  linking  each  subspace  with  the  corresponding  orthogonal  projection  into  the 
subspace. 

Notation.  For  any  closed  space  8,  we  denote  by  P8(.),  the  orthogonal  projection 
operator  from  A  into  8  (i.e.,  for  each  I  t  A,  Pg(X)  is  the  choice  of  element  y  e  B, 

for  which  IX-Yl  is  minimized  over  all  Y  «  8). 

He  do  not  require  that  8  should  be  a  subspace  of  A.  Thus,  the  first  stage  in 
constructing  Pg( . )  is  to  construct  the  combined  space  P  “A  +  8 .  The  orthogonal 
projection  operator  into  B  is  defined  over  P,  and  Pg  is  the  restriction  of  this 
operator  to  A  (now  considered  as  a  aubspace  of  P).  Notice  in  particular  that  Pg  is 
the  identity  operator  if  and  only  if  A  c_  8  and  Pg  is  the  zero  operator  if  and  only 
if  A  1  8. 

The  relationship  between  projections  and  adjusted  beliefs  is  that,  for  any  spaces 


By  and  82,  we  have 


P(81+82>  "  P8,  +  p(82/81>  • 


’  «  *  «  »  ,  ,  •  .w.r.v.v  .  ■  -  v  • 


«  r,  w  4  v 
k  „  •  „■  .  *  ,V 


i*  •  •  .  v-*'- 

*>  V 


Thus,  we  can  add  a  further  basic  property  to  the  properties  (i)  -  (v)  of  adjusted 
spaces  listed  above/  namely 

(vi)  For  any  belief  structures  A,  B1 ,  B2, 

A/(8^2^  m  • 

(The  space  AAB^+B^  is  spanned  by  elements  of  the  fora  X  -  Pg^+g^X  • 

X  -  Pg  x  -  P( Bj/g, )*•  The  sp*00  A/8,  is  spanned  by  elesanta  of  the  fora  x  -  Pg  x,  so 
that  (A/B1)/(B2 /Bi  >  is  spanned  by  elenents  of  the  fora  X  -  Pg^X  -  P82/81X  +  PB2/8.|P81X' 
which  is  the  sane  as  the  elenents  of  A/(8i+82)  «•  p82/8,p8j  is  the  sero  operator 
(because  8.,,  B2/B,  are  orthogonal).  The  s idlest  special  case  of  (vi)  is  when  8,  1  B2> 
so  that 

(vii)  A/B1»B2  -  (A/8.,)/B2  -  (A/B^/B,. 

It  will  often  be  useful  to  be  able  to  "adjust*  spaces  in  several  stages,  and  so  this 
raises  a  natural  converse  question  to  property  (vii)  naaely  for  what  spaces  81 ,  B2  does 

(A/B^/Ba  -  (A/8a>/8i  , 

and  when  does  either  adjustswnt  correspond  to  a  single  adjustaent  (A/P)  for  sons  further 
space  P?  The  answer  is  as  follow 
(viii)  (A/S1)/B2  -  (A/B2)/B, 

if  and  only  if  Pg^  and  Pg^  are  conauting  projections  (i.e.  *gtPg2  "  P82I>8 1  * ' 


Pg^  and  Pg^  coaamte  if  62  »  B  •  P, ,  82  ■  B  •  P2,  where  B,  Vyt  P2  a*m  mutually 


orthogonal.  (This  condition  is  trivially  satisfied  when  8j  A  B2«)  In  this  case  Pg^Pg^  « 
Pg^Pg^  m  Pg  and  (A/8,)/B2  m  (A/B2)/B1  ■  A/(8®P,®P2). 

Further,  as  B1  and  (8 •( )  are  orthogonal  spaces,  we  can  autoaatically  decompose 
the  inner  product  over  8,  +  82.  For  any  X,  Y  e  A,  we  have 

<p(81<2)(x)'  p(81«2)(y))  “  ^B,130'  PB,(Y,)  +  (*<B2/B,){x)'  p(B2/81)(y,)  '  (2) 

A  special  case  of  this  de compos i t ion  which  we  will  frequently  use  follows  from  setting 
B2  «  A.  Thus,  P,g+^ )  is  the  identity  operator  and  so  each  choice  of  space  B  resolves 
each  element  X  CA  into  two  orthogonal  cooponents,  as 

X  -  Pg(X)  +  F^gtX)  (3> 


-6- 
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(3) 


•o  that  the  inner  product  over  A  is  decomposed  aa 

<X,Y)  -  (Pg(X)  ,Pg( Y) )  +  (PA/g(X),  PA/g(Y))  •  (4) 

Thus,  when  you  construct  the  inner  product  structure  over  A,  you  may  separately 
coneider  your  inner  product  structure  S  and  your  belief  structure  (A/8)  and  these 
assessments  must  combine  to  determine  your  belief  structure  A.  This  will  provide  a  wide 
variety  of  coherence  checks  for  your  assessments.  (He  will  see  in  later  articles  that  the 
most  important  coherence  checks  will  be  associated  with  your  revisions  of  belief  over  8 
and  over  (A/B).)  These  checks  will  be  relevant  when  either  8  or  (A/8)  or  both  have  a 
natural  interpretation.  He  have  seen  one  example  already,  namely  in  section  2.  Here  A*  “ 
A/A,,,  and 

^q(X)  -  P(X)XQ,  PA*(X)  -  X  -  P(X)Xq  , 
so  that  relation  ( 1 )  becomes 

(X,Y)  -  P(X)P(Y)  +  cov(X,Y)  , 

i.e.  we  have  decomposed  the  inner  product  into  a  separate  consideration  of  means  and 
covariances. 

In  the  previous  report  on  "Belief  Structures”  we  noted  the  fundamental  relationship 

between  prevision  and  projection.  Notice  that  we  can  numerically  identify  the  prevision 

of  X,  P(X),  with  the  particular  projection  P.  (X).  Further,  the  projection  Pi  is 

«0  0 

•imply  a  particular  case  of  the  general  projection.  Pg  is,  in  a  sense,  a  generalisation 
of  conditional  prevision,  with  the  space  8  acting  analogously  to  the  "conditioning” 
random  quantities.  (He  will  make  the  relationship  precise  in  the  next  subsection.)  Just 
as  our  choice  of  notation  P( . )  allows  us  to  pass  interchangeably  between  probabilities 
and  expectations,  so  it  also  allows  us  to  pass  interchangeably  between  previsions  and 
projections. 

The  space  A/Ag  describes  the  variances  and  covariances  of  the  elements  of  A,  i.e. 
the  variation  in  the  quantities  around  the  plane  of  certainty.  In  the  same  way  A/8 
summarizes  the  variation  in  the  elements  of  A  when  we  have  taken  account  of  the  variation 
in  6.  In  a  sense  P^/g  gives  the  "residual  vectors"  for  the  "fitted  regression"  Pg. 


For  completeness ,  we  now  record  the  basic  formulae  relating  to  projections.  He 


introduce  the  following  piece  of  notation.  Suppose  that  a^,...,  *k*  b1 . br  are 

elements  of  the  inner  product  space  V .  Suppose  we  write  _a_  "  (a1,...,ak).  _b  "  (bj .... ,br ) . 
Then  we  shall  denote  by  (ajbj  the  k  «  r  matrix  whose  (i,j)th  term  is  (a^.bj).  Hith 
this  notation,  for  any  finite  dimensional  subspace  B  and  any  basis  of  B ,  bj , . . . tb^,  the 
projection  operator  Pg  can  be  written,  for  each  X,  as 

pg(X)  -  Mb|b)_1(b|x) 

where  b_  ■ 


Further,  the  squared  distance  between  X  and 
determinants,  as 

,PA/6(X)'2  "  ,X  -  VX)'2 
where  MX)  is  the  vector  (X,  b^ , . . . , b^ ) . 

If  b1,...,bk  are  an  orthogonal  basis,  (i.e. 
formulae  simplify  to  give 


Pg (X)  is  given  by  the  ratio  of  two 

|  (MX)  |b(X) )  | 
l<b|b)| 

(b^fbj)  *  0,  i  y  j),  then  the  above 


P8<x> 


v  <bi'X) 

J  Tb^T  bi 


2  r  <bi,X> 

-  (x,x)  -  l 


i  Vi> 


(5) 


Finally,  the  following  structural  properties  of  projection  operators  will  be  important 
in  later  developewnts. 

(i)  Projections  are  idempotent  (i.e.  Pg  “  Pg ) 

(more  generally,  PgPg  ■  Pg  if  8  c  8^). 

(ii)  Projections  are  self-adjoint,  i.e.  for  any  X,  Y  e  A  +  8 

(X,PgY)  -  (PgX, Y)  . 

(Note  an  operator  is  a  projection  if  and  only  if  it  is  idempotent  and  self-adjoint.) 

An  important  consequence  of  (i)  and  (ii)  is  that  for  any  X,  Ye  A  *8 , 


(iii)  (X,PgY)  -  (X,PgPgY) 

■  <p8x'p8v) 

-  (PgX,Y)  , 

which,  for  example,  gives  a  direct  demonstration  of  the  relationship 

IP^8<X)I2  -  IXI2  -  IPgXl2  . 

(iv)  Pg  is  a  bounded  linear  operator,  iPgl  *  1  over  (At?),  so  that  the  restriction 


of  Pg  to  A  has  norm  not  greater  than  one. 


sup 

X*<0 


ITXI 

IXI 


) 


(For  any  linear  transformation  T,  ITI  « 


5.  Conditional  prevision 

He  will  now  discuss  the  formal  relationship  between  conditional  beliefs  and  adjusted 
belief  spaces.  Thus,  we  begin  by  briefly  reviewing  the  notion  of  conditional  probability, 
or  in  the  present  case  conditional  prevision  which  De  Finetti  defines  as  follows. 
Definition.  The  conditional  prevision  of  the  random  quantity  X,  given  event  H, 
written  P(x|h),  is  the  value  that  you  would  choose  if,  having  made  this  choice,  you  were 


to  suffer  a  penalty  L  given 


L  -  K  H(X-x)2 


where  K  defines  the  units  of  loss  and  H  is  the  indicator  function  for  the  event  H. 

(In  other  words,  we  have  a  "called-off"  penalty,  which  is  only  invoked  if  H  occurs.) 

The  point  to  observe  about  the  definition  is  that  it  appears  to  make  events  "special" 
again.  That  is,  having  argued,  in  constructing  the  belief  structure  in  our  previous 
report,  why  we  do  not  need  to  distinguish  at  a  fundamental  level  between  probability  and 
expectation,  we  have  now  introduced  a  definition  which  only  makes  sense  when  the 
conditioning  random  quantity  H  is  a  two  valued  random  quantity.  If  this  is  really  the 
case,  and  if  our  new  definition  is  actually  necessary  to  our  subsequent  development,  then 
this  suggests  that  such  a  distinction  is  indeed  crucial  to  the  theory.  Further  it  suggests 
that  the  problems  that  we  intended  to  overcome  by  working  with  expectation  rather  than 


probability  are  actually  unavoidable,  as  even  if  we  can  avoid  constructing  an  exhaustive 


collection  of  outcomes  for  the  primary  quantities  of  interest,  we  will  still  be  forced  to 


reduce  observational  evidence  into  a  partition  (which  la  an  even  more  daunting  prospect,  as 
at  least  you  are  free  to  choos?  your  primary  quantities  of  interest,  but  the  "data”  is  far 
less  under  your  control).  What  we  will  argue  in  detail  in  subsequent  articles  is  that 
events  are  not  "special”,  and  that  the  restriction  of  the  definition  of  conditional 
prevision  to  events  is  precisely  as  arbitrary  as  would  be  a  restriction  of  the  definition 
of  prevision  itself  to  events. 

The  coherence  condition  that  De  Finetti  imposes  is  that  you  do  not  prefer  a  given 
penalty  if  you  can  choose  a  different  penalty  which  is  certainly  smaller.  De  Finetti  shows 
that  the  necessary  and  sufficient  condition  for  coherence  in  evaluating  P(x|h),  P(XH) 
and  P( H)  is  that 

P(HX)  -  P(X|H)P(H>  , 

in  addition  to  the  inequality  inf(x|H)  <  P(x|h)  <  sup(x|H),  where  the  inf  and  sup  are 
over  all  values  of  X  consistent  with  H.  (Notice  that  if  X  is  itself  an  event,  then 
the  above  condition  is  the  usual  theorem  of  compound  probabilities.) 

Observe  in  particular  that  P(x|h),  by  definition,  expresses  your  choice  made  now, 
before  H  has  been  revealed,  when  confronted  with  a  penalty  in  X  which  will  be  called 
off  unless  H  occurs.  It  is  very  easy  to  twist  this  around  and  declare  that  if  you 
discover  that  H  has  occurred,  then  your  prevision  for  X  "should"  become  the  value  that 
you  have  assigned  for  P(x|h).  Indeed,  all  of  Bayesian  statistics  is  based  around  this 
"principle".  We  will  discuss  in  detail  in  a  later  article,  precisely  why  this  view  is 
misguided.  For  now,  let  us  simply  observe  that  the  called  off  penalty  definition  of 
conditional  prevision  does  not,  of  itself,  say  anything  concerning  your  future  beliefs. 
Thus,  any  linkage  between  conditional  prevision  and  future  beliefs  is  not  self  evident,  but 
requires  additional  justification. 

Finally,  let  us  briefly  outline  a  useful  property  of  conditioning.  Consider  any 
finite  partition  of  possibilities,  i.e.  a  set  E<|,...,Ek  of  events  such  that  one  and  only 
one  of  the  events  will  occur.  We  may  define  the  prevision  of  X  conditional  on  the 
partition  II  »  {e^  , . .  .  ,Ek)  ,  as 

P(x|n)  -  P(X  |  E1  )£,  +...+  P(X  |  Ej^)E^  . 
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(In  othar  words,  P(X | II )  is  the  random  quantity  which  takes  vslus  P(X|k^)  if  B^ 


occurs.)  Thus 


P(P(x|lI))  -  P(x|x1)P(K1)  +...+  P(X|Ek)P(*x) 

-  P(xat)  +...♦  p(XBk> 

-  PWB,  +...+  Bk))  -  P(X) 

as  +  ...+  Kk  -  1  . 


If  Y  is  tha  random  quantity  which  takas  a  finita  number  of  possibla  values 

y1>...,yk,  than  wa  can  similarly  writs  P(x|y)  -  P(x|n)  whars  tha  partition  is  ovar 
>  Y  "  yj},  and  again 

P(P(X|Y))  -  P(X)  . 

Doaa  this  ralationahip  hold,  if  we  allow  Y  to  taka  an  infinite  numbar  of  possible 

m 

values?  Yes,  if  X  ia  bounded  and  £  P(B^)  “  1,  (as  wa  can  define  Fn  ”  Bj  ♦...♦BJl, 

i*1 

G„  ■  1  -  Fn,  and  write 

P(X)  -  P(x|Fn)P<F„)  +  P(X|Gn)P(Gn>  . 

Tha  second  term  on  tha  ritfit  hand  aide  tends  to  saro,  while  tha  first  tends  to  P(P(x|y>). 

However,  in  general,  whan  wa  drop  tha  property  of  countable  additivity  ovar  tha 
partition,  tha  property  P(P(x|y))  -  P(X),  need  not  hold.  (This  is  termed  non- 
conglomarability. ) 


6.  Adjusted  belief  structures  and  conditional  beliefs 

In  our  treatment  of  belief  structures,  we  observed  that  the  relationship  between 
previaion  and  projection  is  implicit  in  the  definition  of  prevision.  We  will  now  make  a 
similar  identification  between  conditional  prevision  and  the  more  general  projection 
operator. 

Thus,  consider  a  general  operator  Pg,  where  B  is  spanned  by  the  finite  collection 
of  elements  B,,  B2,...Bk.  By  definition  Pg(X)  is  the  linear  combination 
0,8,  +. ckBk,  where  the  coefficients  are  chosen  to  minimise 

P(X  -  (d1a1  +...+  dkBk))2  , 
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over  all  choices  of  dj,. . . ,dk.  Preferring  penalty  A  to  penalty  B  corresponds  to 
F(A)  <  p(B).  Thus,  we  nay  interpret  the  values  Cf,...,ck  as  the  values  which  you  would 
choose  if  you  were  subsequently  to  suffer  the  penalty 

L  -  <X  -  (cjBf  +...+  ckBk>)2  . 

Now  consider  the  definition  of  conditional  prevision,  P(x|h).  You  are  required  to 
choose  your  preferred  penalty  H(X-d)2,  over  choices  of  d,  '  where  H  is  the  indicator 
function  of  the  corresponding  event.  This  looks  somewhat  different  f roe  the  choice  you 
have  to  sake  in  assessing  Pg.  However  observe  that  when  you  assess  P(x|h),  you  are 
also,  by  coherence,  implicitly  making  a  further  assessment  of  P(x|hc),  where  He  “  1-H, 
the  complement  of  H,  as 

P(X)  -  P(X|H)P(H)  +  P(X|HC)(1  -  P(H) ) 

(so  that  if  you  specify  P(x|h),  P(x),  P(H),  then  this  determines  the  value  of  P(x|hc))> 
Thus,  when  you  consider  the  penalty  L  ”  H(X-d)2,  you  also  implicitly  make  an  assess 
sent  for  the  penalty  Lc  -  Hc ( X-dc ) 2 . 

Thus  an  equivalent  formulation  of  the  definition  of  conditional  prevision  is  that  you 

must  specify  two  values  d  and  dc  and  you  will  incur  a  penalty 

L*  -  H(X-d)2  +  Hc(X-dc>2  • 


Now  as 


H  +  Hc  -  1,  HHC  -  0,  H2  -  H,  H2  -  Hp 


the  above  penalty  may  be  identically  rewritten  as 

h  -  (X  -  dH  -  dcHc)2  . 

From  the  discussion  at  the  beginning  of  this  section  it  is  clear  that  your  choice  of 
values  d  «  P(x|h),  dc  “  P(x|hc)  is  equivalent  to  your  choice  of  the  element  dH  +  dcHc 
which  is  the  projection  of  X  into  H,  the  subspace  spanned  by  H,  Hc  i.e. 

PH(X)  -  p(x|h)h  +  p(x|hc)hc  . 

Notice  that  because  P(x|h)  is  the  coefficient  of  H  in  the  projection  of  X 
into  H,  we  can  immediately  deduce  the  usual  formula  for  conditional  prevision  from  the 
standard  formula  (5)  for  the  coefficients  of  the  projection  operator. 


As  h  and  He  ars  orthogonal  vectors,  the  coefficient  of  H  In  F^(X)  is  given 
by  (X,H)/(H,H).  As  H2  -  H,  we  have  that 

P(x|H)  -  P(XH)/P(H)  , 

ss  required. 

Thus,  directly  fro*  the  definition,  conditional  prevision  on  H  is  aieply  the 
projection  into  the  subspace  H  spanned  by  H  and  He.  (Squivently  H  is  the  space 
spanned  by  H  and  Xg,  the  unit  constant,  which  explains  why  your  specification  of 
P(X|H)  fixes  P(x|He).) 

Of  course,  if  we  had  first  established  the  relationship  P(x|h)  “  P(XH)/P(B),  then  we 
could  sieply  reverse  the  above  arguaent  and  deduce  the  relationship  between  conditional 
prevision  and  projection.  Thus,  the  relationship  is  not  so  such  a  property  of  cur 
particular  choice  of  definition,  but  corresponds  to  any  definition  which  yields  the 
familiar  formula  for  conditional  prevision.  However,  we  have  preferred  to  take  a 
formulation  in  which  everything  can  be  immediately  deduced  simply  from  a  careful  statement 
of  the  dafinition  itself. 

In  precisely  the  sane  way,  if  B  -  (B,,...^)  Is  a  partition,  then  letting  II  also 
represent  the  belief  structure  spanned  by  the  random  quantities  Bj,...,S*,  we  have  for 
any  X, 

Pn(X)  -  P(x|b1)B1  ♦...+  PUll*)!*  . 

notice  that  F^U)  is  numerically  equivalent  to  the  quantity  which  we  termed  p(x|lt) 
in  section  4.  In  this  sense  we  move  Interchangeably  between  conditional  prevision  and 
projection.  Notice  that  this  gives  a  geometric  interpretation  as  to  why  P(Pjj(X))  “  P(X). 
That  is,  as  Bj  +...+  B*  »  XQ,  X0  is  an  slemsnt  of  It,  so  that  A0  c  It. 

Thus,  if  you  determine  P(X)  by  projecting  X  directly  into  A0,  or  by  first 

projecting  X  into  H  and  then  into  AQ,  you  will  obtain  the  earns  result  in  either 

case.  This  is  sieply  a  special  case  of  the  general  property  PgPp  =  Pg  if  6  c  P, 

Mien  we  discussed  conditional  prevision  above  we  observed  that  it  was  very  disturbing 
that  we  appeared  to  need  such  a  definition,  because  it  appeared  to  give  events  a  special 
status  that  we  were  anxious  to  avoid.  (Tor  example,  in  our  general  discussion  of  belief 


structures  vs  argued  against  the  requirement  that  we  should  be  forced  into  a  full 
probabilistic  specification  for  all  quantities  of  interest. ) 

We  have  new  completed  the  first  step  in  dispensing  with  the  idea  of  conditional 
prevision,  naswly  we  have  shown  that  the  definition  itself  does  not  introduce  a  new  concept 
into  our  system,  but  simply  identifies  a  particular  type  of  projection  operator.  We  may 
now  repeat  essentially  the  same  general  argument  as  when  we  observed  that  it  would  be  an 
arbitrary  restriction  to  say  that  there  was  something  "special"  about  the  prevision  of  an 
indicator  function  i.e.  that  there  was  no  logical  distinction  between  your  consideration  of 

—  2 

the  penalty  (X-x)  when  X  was  a  two  valued  quantity  or  when  X  was  a  many  valued 
quantity. 

In  the  same  way,  when  you  consider  your  choice  of  penalty  (X  -  c1B1  -. . .-ckBk)2, 
over  choices  of  c.|,...,ck,  there  is  no  logical  distinction  between  your  choice  when 
Bj , . . . ,Bk  happen  to  be  the  indicator  functions  for  the  events  of  a  partition  and  your 
choice  when  Bj,...,Bk  are  any  general  random  quantities. 

Again  this  is  quite  separate  from  the  psychological  question  as  to  which  choices  you 
personally  prefer  to  consider.  You  may  well  find  it  convenient  to  work  with  conditional 
probabilities  in  certain  situations.  Notice  that  if  you  specify  conditional  probabilities 
directly,  and  deduce  various  unconditional  probabilities  from  the  coherence  relatione,  then 
this  corresponds  to  a  direct  specification  of  the  projections  into  certain  subepaces,  and 
construction  of  the  inner  product  space  in  such  a  way  as  to  be  consistent  with  the 
projections.  This  illustrates  the  argument  of  section  4,  namely  that  the  various  adjust¬ 
ments  of  a  belief  structure  offer  you  a  variety  of  different,  but  consistent  approaches  to 
specifications  of  your  beliefs,  and  that  you  should  choose  the  most  intuitively  meaningful 
approach  for  the  problem  at  hand. 

It  still  remains  to  be  considered  whether  the  projection  into  a  subspace  spanned  by 
indicator  functions  has  some  property  (logical,  not  psychological)  which  distinguishes  it 
from  the  more  general  projection.  In  particular,  it  might  be  thought  that  the 
interpretation  of  a  projection  into  an  indicator  space  is  "special",  because  it  corresponds 


to  tha  usual  Bayaalan  approach  of  revising  your  ballaf a  by  conditioning  on  obaarvad  avants 
(whereas ,  there  la  no  such  obvious  correspondence  for  tha  ganaral  projaction) . 

In  latar  articles  we  will  show  that  such  a  distinction  is  entirely  without  foundation 
(at  least,  in  our  approach).  For  now,  let  us  emphasise  again  that  everything  that  we  have 
said  concerning  your  conditional  previsions,  and  the  corresponding  projections,  relates  to 
probabilistic  relationships  (expressed  now)  between  various  random  quantities.  There  is, 
as  yet,  abeolutely  no  implication  in  anything  that  we  have  said  to  suggest  any  relationship 
whatever  between  your  expressed  beliefs  at  different  tine  points.  Such  relationships  will 
turn  out  to  be  the  crucial  feature  of  the  theory  of  belief  structures.  However,  because 
this  issue  requires  very  careful  consideration,  we  will  defer  it  co^letely  to  later 
articles  when  we  have  laid  the  necessary  groundwork  for  a  proper  treatment.  Thus,  for 
example,  the  conditional  prevision  P(x|h)  bears,  as  yet,  no  relationship  with  the  value 
that  you  may  express  for  P(X)  if  you  learn  that  H  occurs.  Do  not  imagine  intrinsic 
properties  of  quantities  without  providing  careful  justification. 

Finally,  let  us  extend  the  link  between  belief  structures  and  full  Bayesian  speci¬ 
fications  to  cover  the  general  adjusted  belief  space.  Thus,  we  begin  with  the  space 
A  -  l2(Q,P)  (i.e.  the  space  of  all  square  integrable  functions  on  Q  under  the  usual 
I<2  inner  product  with  respect  to  the  probability  measure  P).  We  introduce  the  new 
space  8  -  I^tS.Q),  where  Q  is  a  probability  measure  over  the  probability  space  S.  The 
projection  operator  Pg  ia  "conditional  expectation",  that  is  for  any  element  g  eA, 
Pg(g)  is  the  element  of  B  defined  pointwise  for  each  s  6  S  by 

(PggXs)  -  Ja  g(w)dP(w|s>  . 

(The  conditioning  ia  with  respect  to  the  joint  probability  distribution  on  0  x  S,  which 
essentially  creates  the  apace  A  +  8 . ) 

In  more  familiar  terse  0  is  usually  the  "parameter  space"  and  S  is  the  "sample 
apace".  Typically  there  is  a  joint  p.d.f.  over  8  x  s  composed  as  f(s,w)  -  f(s|w)p(w), 
where  p(w)  is  the  prior  density  for  the  parameter  w,  and  f(s|w)  is  the  likelihood 
function.  Bie  above  integral  thus  reduces  to 

(Pngxs)  -  /  . 

I  f (s|w)p(w)dw 

-15- 


7.  Adjusted  belief  construction 

As  s  sispl*  example,  to  motivate  ths  next  stag*  in  our  development,  let  us  return  to 
the  example  discussed  in  the  previous  article.  Suppose  that  the  teacher  decides  to 
restrict  the  belief  space  A  to  quantities  for  which  he  is  fundamentally  interested  in 
specifying  his  beliefs.  In  this  case,  that  will  be  the  various  numerical  measures  of  the 
student's  performance  in  the  coming  year.  He  placea  all  of  the  other  quantities,  such  as 
the  previous  year's  test  scorea  into  a  second  belief  structure  B  (i.e.  B  contains  all 
of  those  quantities  about  which  he  has  no  interest  in  specifying  belief  except  inasmuch  as 
this  will  suggest  or  explain  or  clarify  certain  of  his  beliefs  over  A) . 

This  separation  will  be  an  Important  feature  of  our  further  development.  Thus,  let  us 
term  A  the  primary  belief  structure  and  S  the  support  structure.  These  terms  do  not 
reflect  fundamental  aspects  of  the  random  quantities  involved  (for  example,  scam  functions 
of  a  particular  random  quantity  may  be  elements  of  A,  and  other  functions  of  the  seam 
random  quantity  may  be  elements  of  8).  Instead  the  terms  reflect  the  external  objectives 
for  which  the  specification  is  being  smde.  We  will  term  V  “  A  +  B  the  total  belief 
structure  (i.e.  V  is  the  structure  which,  in  principle,  delimits  the  arguments  which  oan 
be  made). 

As  a  simple  example,  suppose  that  the  teacher  has  constructed  the  primary  space  A 
spanned  by  XQ  -  1,  X1  «  S,  and  the  support  space  B  spanned  by  {XQ ,Y)  (8  is  the  score 
for  the  coming  test,  Y  is  the  score  for  a  cosqparable  earlier  test  -  though  the  teacher 


has  not  yet  seen  the  value  of  Y).  He  then  constructs  the  space  A/8.  Thus,  he  has 


assessed  values  for  P(S),  P(S2),  P(Y),  P(Y2)  and  P(YS).  He  constructs  the  projection  Fg 


by  using  the  relation  Pg  «  P^  +  P„  and  formula  (5)  to  give  the  familiar  "least 

0  'o 


squares"  formulae 


P8(S)  -  P(S)Xq  ♦  (Y  -  P(Y)Yq) 


and 


(6) 


,PA/8*S* *2  “  18  "  V81'2  '  V,r  8  "  Cv*r<(Y)> 


As  we  have  emphasized  above,  Pg  is  simply  a  generalisation  of  P^  ,  that  is  from  a 


numerically  fixed  prevision  (with  associated  penalty  (S-P (S) )  )  to  a  numerically  random 


prevision  (with  penalty  (8-S^(8))‘).  If  you  express  a  strong  preference  for  f£(8) 
over  P^(S),  (as  quantified  by  a  large  value  of  dP^i  (8) 1 2  -  IP^^j(S)l2),  or 
equivalently  a  large  value  of  — <  then  this  information  is  not  qualitatively 
different  from  announcing  a  strong  preference  for  one  value  of  P(S>  over  another  (for 
example,  preferring  a  penalty  ( 8- 1 ) *  to  a  penalty  ( S— 5 ) 2 ) •  It  remains  for  you  to 
interpret  your  preference  in  the  context  of  the  problem  under  consideration.  In  this 
case,  strong  preference  for  Pg(S)  over  P^(  8)  might  suggest  that  use  of  Y  to 
"predict"  8  but  we  will  consider  this  in  detail  in  subsequent  articles,  for  now,  we 
view  Pg(S)  simply  as  a  "random  prevision",  asserted  now. 

To  illustrate  this  interpretation,  consider  the  following  calculation.  Suppoee  that 
there  is  a  critical  "pass-fail"  level  s  for  8,  and  let  ls  -  1  if  8  >  s  and  ls  “ 

otherwise.  Define  Iy  -  1  if  Y  >  y  (a  corresponding  critical  value),  and  Iy  -  0 

otherwise.  The  obvious  quantity  to  consider  is  the  conditional  probability  that  you  will 
pass  test  8  given  that  you  have  passed  test  Y,  i.e.  to  evaluate  P(S>e|Y>y),  or 
equivalently  to  evaluate  P(IslY)/P(Iy). 

Thus,  suppose  you  assess  P(Ig)  "  p,  P(Iy)  “  q,  P(IgIy)  “  u  no  that  P(S>s|Y>y) 

•  u/q.  The  larger  the  value  of  u/q  compared  to  p  the  more  "relevant"  it  may  be  to 
observe  the  event  Y  >  y.  How  can  we  expresa  this? 

Prom  the  discussion  of  section  6  the  conditional  probability  argument  can  be  aet  in 

the  inner  product  space.  Thus  let  A  be  spanned  by  Xq,  Is,  and  B  by  X0,  ly.  The 

values  p,  q,  u  fully  specify  the  total  apace  A  +  B.  Applying  formulae  (6)  we  have 


cov(I  ,1  ) 

VV  •  V'V  *  -v.r-(V  (IY  -  *‘VV 


1*0  +  2  (IY  '  <*V  ' 


S,  ■''e  '■.r, ht*  r.  r,  *■ 


I^(IS)  takes  2  possible  values.  If  Y  >  y,  i.e.  Iy  -  1,  then  numerically 

Pg(Ig)  -  u/q  -  P(S>s|Y>y> 

and  if  Iy  »  0,  then  numerically 

Pgds)  -  P(S>s | Y<y)  , 
so  that,  with  I„  -  1  -  1^ ,  we  have 

VV  -  iYp(is|iY)  +  iyp(is|iy>  . 

This  is  of  course  the  formula  for  projection  from  section  6.  We  have  derived  it  again 
to  emphasise  that  it  is  precisely  the  same  equation  as  (6),  derived  in  the  same  way  (i.e. 
through  (7))  and  with  the  same  justification  (in  terms  of  choosing  a  "random"  prevision)  as 
for  (6),  but  simply  applied  to  two-valued  random  quantities,  rather  than  many  valued  random 
quantities.  Just  as  the  norm  of  the  residual  vector  in  (6)  plays  an  important  role  in 
determining  the  value  of  Y,  in  assessing  S,  so  does  the  corresponding  norm  in  (7) 
relate  to  the  value  of  IY  in  assessing  Is. 

It  is  up  to  you  whether  you  want  to  consider  the  quantities,  summarised  in  (6)  or 
(7).  All  that  we  have  observed  is  that  if,  for  example,  you  specify  the  values  for  P(IS), 
P(IY)  and  P(ISIY),  then  the  theory  will  determine  for  you  the  quantities  in  (7)  (and 
nothing  else).  It  is  up  to  you  in  any  particular  problem  to  decide  what  quantities  you 
wish  to  determine.  All  that  theory  can  provide  is  an  organizing  framework  in  which  the 
implications  of  your  specifications  can  be  clearly  displayed.  That  framework  concerns  the 
analysis  of  belief  transformations  over  A,  and  will  be  the  subject  of  our  next  article. 
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